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Distance Covariance in Metric Spaces 

by Russell Lyons 



Abstract. We extend the theory of distance (Brownian) covariance from 
Euclidean spaces, where it was introduced by Szekely, Rizzo and Bakirov, to 
general metric spaces. We show that for testing independence, it is necessary 
and sufficient that the metric space be of strong negative type. In particular, 
we show that this holds for separable Hilbert spaces, which answers a question 
of Kosorok. Instead of the manipulations of Fourier transforms used in the 
original work, we use elementary inequalities for metric spaces and embeddings 
in Hilbert spaces. 



51. Introduction. 



Szekely, Rizzo, and Bakirov ( 20Q7| ) introduced a new statistical test for the following 



problem: given IID samples of a pair of random variables (X, Y) , where X and Y have 
finite first moments, are X and Y independent? Among the virtues of their test is that it 
is extremely simple to compute, based merely on a quadratic polynomial of the distances 
between points in the sample, and that it is consistent against all alternatives (with finite 
first moments). The test statistic is based on a new notion called "distance covariance" 
or "distance correlation" . The paper by Szekely and Rizzo (|2009|) introduced another new 



notion, "Brownian covariance" , and showed it to be the same as distance covariance. That 
paper also gave more examples of its use. This latter paper elicited such interest that it 
was accompanied by a 3-page editorial introduction and 42 pages of comments. 

Although the theory presented in those papers is very beautiful, it also gives the 
impression of being rather technical, relying on various manipulations with Fourier trans- 
forms and arcane integrals. Answering a question from Szekely (personal communication, 
2010), we show that almost the entire theory can be developed for general metric spaces, 
where it necessarily becomes much more elementary and transparent. A crucial point of 
the theory is that the distance covariance of (X, Y) is iff X and Y are independent. This 
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does not hold for general metric spaces, but we characterize those for which it does hold. 
Namely, they are the metric spaces that have what we term "strong negative type" . 

In fact, negative type had arisen already in the work of Szekely, Rizzo, and Bakirov 
( [2007 ), hereinafter referred to as [SRB| . It was especially prominent in its predecessors, 
Szekely and Rizzo ( |2005a| , [2005 b| ). The notion of strict negative type is standard, but 
we need a strengthening of it that we term "strong negative type". (These notions were 
conflated in |SRB| and Szekely and Rizzo (|2005a| , |2005b| ).) 

The concept of negative type is old, but has enjoyed a resurgence of interest recently 
due to its uses in theoretical computer science, where embeddings of metric spaces, such 
as graphs, play a useful role in algorithms; see, e.g., Naor ( |2010|) and Deza and Laurent 
( p.997 ). The fact that Euclidean space has negative type is behind the following charming 
and venerable puzzle: Given n red points Xi and n blue points x\ in M p , show that the sum 
2 J2i j \\ x i ~ x 'j\\ °f the distances between the 2n 2 ordered pairs of points of opposite color 
is at least the sum £\ • — Xj\\ + ||a^ — x'j\\) of the distances between the 2n 2 ordered 
pairs of points of the same color. The reason the solution is not obvious is that it requires a 
special property of Euclidean space. The connection to embeddings is that, as Schoenberg 
( p.937| , |1938| ) showed, negative type is equivalent to a certain property of embeddability 
into Hilbert space. Indeed, if distance in the puzzle were replaced by squared distance, it 
would be easy. 

If we replace the sums of distances in the puzzle by averages, and then replace the 
two finite sets of points by two probability distributions (with finite first moments), we 
arrive at an equivalent property, called negative type. The condition that equality holds 
only when the two distributions are equal is called "strong negative type". It means 
that a simple computation involving average distances allows one to distinguish any two 
probability distributions. Many statistical tests are aimed at distinguishing two probability 
distributions, or distinguishing two families of distributions. This is what lies directly 
behind the tests in Szekely and Rizzo ( |2005a| , [2005b| ). It is also what lies behind the 
papers Bakirov, Rizzo, and Szekely ( |2006|) , |SRB| , and Szekely and Rizzo ( |2009|) , but there 
it is somewhat hidden. We bring this out more clearly in showing how distance covariance 
allows a test for independence precisely when the two marginal distributions lie in metric 
spaces of strong negative type. 

In Section we define distance covariance and prove its basic properties for general 
metric spaces. This includes a statistical test for independence, but it is consistent against 
all alternatives only in the case of spaces of strong negative type, as explained in Section [| 
In Section |3], we also sketch short proofs of Schoenberg's theorem and short solutions of 
the above puzzle (none being original). It turns out that various embeddings into Hilbert 
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space, though necessarily equivalent at the abstract level, are useful for different specific 
purposes. In both sections, we separate needed results from other interesting results by 
putting the latter in explicit remarks. We show that the full theory extends to separable- 
Hilbert-space- valued random variables, which resolves a question of Kosorok ( [2009 ). We 
remark at the end of the paper that if (&,d) is a metric space of negative type, then 
( JT, d r ) has strong negative type for all r G (0, 1); this means that if in a given application 
one has negative type but not strong negative type (for example, in an L 1 metric space), 
then a simple modification of the metric allows the full theory to apply. 



§2. General Metric Spaces. 

Let (JT,d) be a metric space. Let M(ijT) denote the finite signed Borel measures 
on i£T and Mi(JT) be the subset of probability measures. We say that fx G M{3£) has a 
finite first moment if d(o, x) d\fx\(x) < oo for some o G !%> '. The choice of o G S£ 
does not matter by virtue of the triangle inequality. If fx, fx' G M(i2f) both have finite 
first moments, then f d(x,x') d(\[x\ x \fx'\)(x,x') < oo since d(x,x') < d(o,x) + d(o,x'). 
Therefore, J d(x,x') dfx(x) dix'(x') is defined and finite. In particular, we may define 



a M (x) := J d(x, x) dfx(x') 

and 

as finite numbers when \x G M( JT) has a finite first moment. Also, write 

dfj,(x, x') := d(x, x') — a M (x) — a M (x') + D((x) . 
The function d^ is better behaved than d in the following sense: 

Lemma 2.1. Let be any metric space. If ll g Mi(<f2T) has a finite first moment, i.e., 
d(x,x') G L x {ix x fx), then d^(x,x') G L 2 (fx x fx). 

Proof. For simplicity, write a(x) := a^(x) and a := D(fx). Let X,X f ~ fx be independent. 
By the triangle inequality, we have 

\d(x,x')-a(x)\<a(x'), (2.1) 

whence 

J d^x, x') dfx 2 (x, x') = E (d(X, X') - a(X) - a(X') + a) 2 < E[XiX 2 ] , 

where X-i := max {\a - 2a(X')\, a} and X 2 : = max ||a — 2a(X)\, a}. Since X\ and X 2 are 
integrable and independent, X\X 2 is also integrable, with ELY1X2] < 4a 2 . I 



3 



The proof of Lemma 2A shows that Hc^l^ < 2D(/j,) = 2||d||i, but the factor of 2 will 
be removed in Proposition 



We call \x G M(i?T) degenerate if its support consists of only a single point. 

Remark 2.2. Let \i G Mi(JT) have finite first moment and be non-degenerate. Although 
d^(x,x') < d(x,x') for all x,x' G ^T, it is not true that |(f M (x, x')| < d(x,x') for all x,x' in 
the support of \i. To see these, we prove first that 

a M (x) > D(fj)/2 (2.2) 

for all x G JT. Indeed, D(/i) = J d(x', x") d/j?(x', x") < f[d(x', x) + d(x, x")} dfi 2 (x', x") = 
2a fJb (x). Furthermore, if equality holds, then d(x', x") = d(x', x) + d(x, x") for all x', x" in 
the support of \i. Put x' = x" to get that x = x', contradicting that \x is not degenerate. 
This proves |(2.2)| . Using |(2.2)| twice in the definition of d M gives d^ < d. On the other 



hand, |(2.2)| also shows that x) < = —d(x, x) for all x. 



Now let d) be another metric space. Let 9 G M\(S£ x &') have finite first moments 
for each of its marginals \i on <^T and v on W . Define 

8o((x, y), (x', y')) := x')d v (y, y') . 



By Lemma |2.1| and the Cauchy-Schwarz inequality, we may define 

dcov(#) := / 5 e ((x 1 y) 1 (x' 1 y'))d6 2 ((x 1 y) 1 (x' 1 y')). 



It is immediate from the definition that if 9 is a product measure, then dcov(#) = 0; the 
converse statement is not always true and is the key topic of the theory. Metric spaces 
that satisfy this are characterized in Section || as those of strong negative type. Similarly, 
spaces for which dcov > are characterized in Section |3| as those of negative type. |SRB| 
call the square root of dcov(#) the distance covariance of 9, but they work only in the 
context of Euclidean spaces, where dcov > 0. They denote that square root by dCov(6>). 

When (X, Y) are random variables with distribution 9 G M\(3£ x we also write 
dcov(X, Y) := dcov(6>). If (X, Y) and (X',Y') are independent, both with distribution 9 
having marginals \x and u, then 

dcov(#) = E[(d(X, X') - a^X) - a^X') + D(jjl)) (d(Y, Y 1 ) - a u {Y) - a v {Y') + D{u))} . 
The following generalizes (2.5) of [SRB| . 
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Proposition 2.3. Let 3£ and <3f be any metric spaces. Let 9 e M\{3£ x <3f) have finite 
first moments for each of its marginals fx on S£ and v on *3/ . Let (X, Y) ~ 9. Then 



| dcov(X, Y) | < v/dcov(X,X)dcov(y,y) (2.3) 
<D(ji)D(u). 

Furthermore, dcov(AT, X) = D(fi) 2 iff fx is concentrated on at most two points. 



Proof. The Cauchy-Schwarz inequality shows |(2.3)| . It remains to show that 

dcov(X,X) < D(fu) 2 (2.4) 
and to analyze the case of equality. As before, write a(x) := a M (x) and a := D(fi). By 



2.1)| , we have 

E[\d{X,X') -a(X)\a(X)] < E[a(X')a(X)] = a 2 < oo , 

whence ~E[[d(X,X') — a(X)]a(X)~\ = by Fubini's theorem (i.e., condition on X). Sim- 
ilarly, E[[d(X,X') - a(X')]a(X')] = 0. Thus, expanding the square in dcov(X, X) = 
E (d(X, X') — a(X) — a(X') + a) and replacing d(X, X') 2 there by the larger quantity 
d(X, X') [a(X) + a(X')] yields [d(X, X') - a(X)]a(X) + [d(X, X') - a(X')]a(X') plus other 
terms that are individually integrable with integrals summing to a 2 . This shows the in- 
equality (2.4) . Furthermore, it shows that equality holds iff for all points x,x' in the 



support of fx, if d(x,x') ^ 0, then d(x, x') = a(x) + a(x'). Since the right-hand side 
equals f[d(x, o) + d(o, x')] dfx(o), it follows that d(x, x') = d(x, o) + d(o, x') for all o in the 
support of fx. If there is an o ^ x,x' in the support of fx, then we similarly have that 
d(x, o) = d(x,x') + d(x',o). Adding these equations together shows that d(o,x') = 0, a 
contradiction. That is, if dcov(X,X) = D(fi) 2 , then the support of fx has size 1 or 2. The 
converse is clear. I 



The next proposition generalizes Theorem 4(i) of [5RB 



Proposition 2.4. If &cov(X, X) = 0, then X is degenerate. 

Proof. As before, write a(x) := a^(x) and a := D(fx), where X ~ fx. The hypothesis 
implies that d(X, X') — a(X) — a(X') + a = a.s. Since all functions here are continuous, 
we have d(x, x') — a(x) — a(x') + a = for all x, x' in the support of fx. Put x = x' to 
deduce that for all x in the support of fx, we have a(x) = a/2. Therefore, d(X, X') = 
a.s. 1 
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Assume that fi and v are non-degenerate. Then the right-hand side of (2.3) is not 0; 
the quotient dcav(9)/[D(fj,)D(y)] is the square of what is called the distance correlation 
of 9 in |SRB| . In |SRB| , this quotient is always non-negative. 

This next proposition extends Theorem 3 (hi) of SRB. 



Proposition 2.5. If ^ and v are non- degenerate and equality holds in \(2.3)\ , then for some 
c > 0, there is a continuous map f : S£ — > W such that for all x, x' in the support of \i, 
we have <i(x, x') = c<i(/(x), f(x')) and y = f(x) for 9-a.e. (x, y). 

Proof. Write a(x) := a At (x), a := -D(/u), b(y) := a v (y), and b := D(v). Equality holds in 
|(2.3)| iff there is some constant c such that 

d(x, x') — a(x) — a(x') + a = c(d(y, y') — b(y) — b(y') + b) 

for 6* 2 -a.e. (x,y), (x',y'), i.e., 

<i(x, x') — cd(y, y') = a(x) — cb(y) + a(x') — cb(y') + cb — a . 

Since all functions here are continuous, the same holds for all (x, y), (x', y') in the support 
of 9. Put (x, y) = (x', y') to deduce that for all (x, y) in the support of 6>, we have 
a(x) — cb(y) = (a — cb)/2. This means that <i(x, x') = cd(y,y') 6> 2 -a.s. The conclusion 
follows. I 



We now extend Theorem 2 of [SRB. 

Proposition 2.6. Let and & be metric spaces. Let 9 e M\{SE x W) have marginals 
with finite first moment. Let 9 n be the (random) empirical measure of the first n samples 
from an infinite sequence of IID samples of 9. Then dcov(# n ) — > dcov(6*) a.s. 

Proof. Let (X 1 , Y l ) ~ 9 be independent for 1 < i < 6. Write 

f(zi, z 2 , z 3 , Z4) := d(z 1 ,z 2 ) - d(zi, z 3 ) - d(z 2 , z±) + d(z 3 , z 4 ) . 
Here, Zi G or z% G & ■ The triangle inequality gives that 

\f(zi,z 2 ,Z3,Z4)\ < g(zi, z 3 ,Z4) := max {d(z 3 , za), d(z±, z 3 )} 

and 

1/(^1,^2,^3,^4)1 < g{z 2 ,z 4: ,zs) = max {d(z 3 , z 4 ), d(z 2 , z 4 )} . 
Since g{X 1 ,X 2 ',X A ) and g(Y 2 ,Y 6 ,Y 5 ) are integrable and independent, it follows that 
h((X\ F 1 ), . . . , (X 6 , Y 6 )) := f(X\X 2 , X 3 , X 4 )f(Y\ Y\ Y\ Y 6 ) 

is integrable. Fubini's theorem thus shows that its expectation equals dcov(#). Similarly, 
dcov(6> n ) are the V-statistics for the kernel h of degree 6. Hence, the result follows. I 
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The proof of Proposition 2.6 for general metric spaces is more straightforward if second 



moments are finite, as in Remark 3 of HRB 



We next extend Theorem 5 of ISRB 



Theorem 2.7. Let SE , <Z¥ be metric spaces. Let 9 e M\{3£ x have marginals fx, v 
with finite first moment. Let 9 n be the empirical measure of the first n samples from an 
infinite sequence of IID samples of 9. Let Aj be the eigenvalues (with multiplicity) of the 
map that sends F E L 2 {9) to the function 

(x,y)y-+ / 6 ((x,y), (x',y'))F(x',y')d9(x',y') . 



If 9 = fx x v, then ndcov(0 n ) =>- 52 4 AiZ?, where Zi are IID standard normal random 
variables and J2i = D(fx)D(v) . 



Proof. We use the same notation as in the proof of Proposition |2.6[ That proof shows 
that h is integrable when fx and v have finite first moments; the case X 1 = Y l shows 
then that f(X , X 2 , X 3 , X A ) has finite second moment. Therefore, when 9 = fx x u, 
h^X 1 , y 1 ), . . . , (X 6 , F 6 )) has finite second moment. 

Assume now that 9 = fx x v. Then kernel h is degenerate of order 1. Let h be the 
symmetrized version of h. Then since 9 = fx x u, 

h 2 ((x, y), (x\ y')) := E[h((x, y), (x\ y'), (X 3 , F 3 ), . . . , (X 6 , Y e ))] = 5 e ((x, y), (x', y'))/15 . 

Hence the result follows from the well-known theory of degenerate V-statistics (compare 
Theorem 5.5.2 in Serfling ( |1980|) or Example 12.11 in van der Vaart ( |1998|) for the case 



of U-statistics). Finally, we have ^ Aj = J 5o((x,y),(x,y)) d9(x,y) = D(fx)D(u) since 
9 = fx x v. I 

Corollary 2.8. Let S£ , <3f be metric spaces. Let 9 e M x (3£ x W) have non- degenerate 
marginals fx, v with finite first moment. Let 9 n be the empirical measure of the first n 
samples from an infinite sequence of IID samples of 9. Let fi n , v n be the marginals of 9 n . 
If 9 = fx x v, then 

nteov(9 n ) KZ 2 

D{fi n )D{v n ) D(fx)D{u) ' 1 • ] 

where \ and Zi are as in Theorem |£. 7[ and the right-hand side has expectation 1. If 
dcov(6') ^ 0, then the left-hand side of \(2.5\ tends to ±oo a.s. 



Proof. Since D(fx n ) and D(y n ) are V-statistics, we have D(fx n ) — >■ D(fx) and D{y n ) — > 
D{v) a.s. Thus, the first case follows from Theorem |2.7| . The second case follows from 
Proposition |2.6|. I 
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Remark 2.9. Since 9 = \i x u, the map in Theorem |2j] is the tensor product of the maps 



and 



L 2 (u) 3 F ^ (y y 2/ W) <My')) ■ 
Therefore, the eigenvalues Aj are the products of the eigenvalues of these two maps. 



§3. Spaces of Negative Type. 

Corollary [0| is incomplete in that it does not specify what happens when dcov(#) = 
and 9 is not a product measure. In order for the statistics dcov(6* n ) to give a test for 
independence that is consistent against all alternatives, it suffices to rule out this missing 
case. In this section, we show that this case never arises for metric spaces of strong negative 
type, but otherwise it does. This will require the development of several other theorems of 
independent interest. We intersperse these theorems with their specializations to Euclidean 
space. 

The puzzle we recalled in the introduction can be stated the following way for a metric 
space (&,d): Let n > 1 and x±, . . . ,X2n £ • Write CKj for the indicator that Xi is red 
minus the indicator that xi is blue. Then Y^i=i a i = an d 

aiajd(xi, Xj) < . 

i,j<2n 

By considering repetitions of Xi and taking limits, we arrive at a superficially more general 
property: For all n > 1, x±, . . . ,x n G and a±, . . . ,a n eK with J27=i a i = 0' we have 

atiCtjdfai, Xj) <0. (3.6) 

i,j<n 

We say that (JT, d) has negative type if this property holds. A list of metric spaces of 
negative type appears as Theorem 3.6 of Meckes (|2010|) ; in particular, this includes all L p 
spaces for 1 < p < 2. On the other hand, M. n with the £ p -metric is not of negative type 
whenever 3 < n < oo and 2 < p < oo, as proved by Dor (|1976|) combined with Theorem 
2 of Bretagnolle, Dacunha-Castelle, and Krivine (|1965/1966| ); see Koldobsky and Lonke 
( |1999| ) for an extension to spaces that include some Orlicz spaces, among others. 



If we define the n x n matrix K whose entry is d(xi,Xj), then |(3.6)| says, by 



definition, that K is conditionally negative semidefinite. This explains the name "negative 



8 



type" . We can construct another matrix K from K that is negative semidefinite as follows: 
Let P be the orthogonal projection of M. n onto the ort ho complement of the constant vectors. 
Then as operators, K := PKP. Let \i n be the empirical measure of x±, . . . ,x n . The 
entry of K is easily verified to be d fln (xi, Xj), which begins to explain the appearance of 
dfj, in Section |^. We write K < to mean that K is negative semidefinite. 

If and W are both metric spaces of negative type and (xi,yi) G 3£ x then 
let K and L be the distance matrices for Xi and yi, respectively. Let 9 n be the empirical 
measure of the sequence ((xi,yi); 1 < i < n). We have K < and L < 0, whence 
tr (KL) = tr (^v^^V^V^V 7 ^) > 0. That is, 

< tr (RL) = n 2 dcov(# n ) . 

This begins to explain the origin of dcov. To go further, we use embeddings into Hilbert 
space. 

Now 3C is of negative type iff there is a Hilbert space H and a map <p : S£ — > H 
such that Vx,x' G 2£ d(x,x') = \\<fi(x) — <p(x')\\ 2 , as shown by Schoenberg (|1937| , |1938|) . 
Indeed, given such a </>, |(3.6)| is easy to verify: see |(3.9)| below. For the converse, consider 
Xi, . . . , x n G Since K < 0, there are vectors Vi G W 1 such that (vi, vj) is the 
entry of — K for all i, j (the matrix \/—K has i>i for its ith column). Computing \\vi — v 3 -\\ 2 
then yields \\vi/\/2 — Vj/\/2\\ 2 = d(xi,Xj). This provides a map defined on the points 
xi, . . . ,x n . When we increase the domain of such a (f), the distances of the images already 
defined are preserved, whence we may embed all these images in a fixed Hilbert space. If 
X is separable, we may thus define 4> on a countable dense subset by induction, and then 
extend by continuity. In general and alternatively, define 

d (x, x') := [d(x, o) + <i(o, x') — d(x, x')] /2 

for some fixed o G 2£ . Let V be the finitely supported functions on 3C . The fact that 
is of negative type implies that (f,g) := Yli X x'ex f^ x )9^ x ')^°^ x ' x ') ls a semi-inner 
product on V . The Cauchy-Schwarz inequality implies that Vq := {/ G V ; (/,/) = 0} is 
a subspace of V. Let H be the completion of V/Vq. Then the map (j) : x \-> l{ x y + Vq has 
the property desired. Note that H is separable when i?f is. 

Of course, any two isometric embeddings 0i,02 : ,d x l 2 ) — > H are equivalent in 
the sense that there exists an isometry g : H\ — > H2 such that (f>2 = 9 </>i, where Hi is 
the closed affine span of the image of fa: Define g((j)i(x)) :— 4>2(x) for x G S6 ', extend 
by affine linearity (which is well defined by a property of Euclidean space), and then 
extend by continuity. We shall call an isometric embedding </> : (i?f , d 1 / 2 ) — > H simply an 
embedding. 
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A direct proof that R n is of negative type is the following. When n = 1, define <j>{x) 
to be the function l[o,oo) ~~ l[x,oo) i n L 2 (M, A), where A is Lebesgue measure. This is easily 
seen to have the desired property. When n > 2, define f x (s) := \\x — s|| - ( n-1 " 2 and 
9x '■= fx — fo for x G M n . Then g x G L 2 (IR n , A n ), as calculus shows (for large s, we have 
g x (s) = 0(||s|| _ ^ n+1 ^/ 2 )). Furthermore, there is a constant c such that \\g x \\2 = cUxjl 1 / 2 by 
homogeneity, whence translation invariance gives \\g x — g x '\\2 = \\9x—x'\\2 — c \\ x ~ x'W 1 ■> 
so that 4>(x) := g x /c has the desired property. Call this embedding the Riesz embedding 
since f x (s) is a Riesz kernel. 

Another embedding <fi for M. n is as follows: <fi(x) is the function s i— > c(l — e~ %s ' x ) 
in L 2 (FX n ) for some constant c, where F(s) := ||s|| _( - n " , ~ 1 - ) . See Lemma 1 of Szekely and 
Rizzo (|2005a| ) for a proof. This is the Fourier transform of the Riesz embedding, in other 
words, the composition of the Riesz embedding with the Fourier isometry. We shall refer 
to this embedding as the Fourier embedding. 

Other important embeddings use Brownian motion. When n = 1, let B x be Brownian 
motion defined for x G K with So = 0. We may then define (j){x) := B x , thought of as 
a function in £ 2 (P) for some probability measure P. Likewise, the case n > 2 can be 
accomplished by using Levy's multiparameter Brownian motion. We shall refer to these 
embeddings as the Brownian embeddings . Sample-path continuity of these Brownian 
motions plays no role for us; only their Gaussian structure matters. In fact, their existence 
depends only on the fact that W 1 has negative type. 

An embedding that does not rely on calculation goes as follows: Let a be the (infinite) 
Borel measure on half-spaces S C M. n that is invariant under translations and rotations, 
normalized so that 

a({0 G S,x i S}) = \\x\\/2 (3.7) 

for ||x|| = 1. If we parametrize half-spaces as S = {x G W 1 ; z ■ x < s} with z G S n_1 and 
s G JR., then a = c n Q n x A for some constant c n , where Q n is volume measure on 
Scaling shows that |(3.7)| holds for all x. Now let <f)(x) be the function S ls(0) — ls(x) 



in L 2 (a). We call this the Crofton embedding, as Crofton ( [1868) was the first to give 



a formula for the distance of points in the plane in terms of lines intersecting the segment 
joining them. 

We return now to general metric spaces of negative type. Suppose that G 
Mi(M") have finite first moments. By approximating \ii by probability measures of finite 
support (e.g., IID samples give V-statistics), we see that when has negative type, 

D(u x -n<2) <0. (3.8) 
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We say that (3£,d) has strong negative type if it has negative type and equality holds 
3.8)| only when u\ = ^t 2 - When \ii are restricted to measures of finite support, then this 



m 



is the condition that (JT,<i) be of strict negative type. A simple example of a metric 
space of non-strict negative type is i 1 on a 2-point space, i.e., 1R 2 with the ^-metric. 

Consider an embedding <p as above. Define the (linear) barycenter map (3 = (3^ : i-> 
J 4>{x) d/i(x) on the set of measures /i G M(^T) with finite first moment. (Although it 
suffices that J d(o, x) 1 / 2 d\/j,\(x) < oo to define f3(n), this will not suffice for our purposes.) 
Note that 

r r 

d(x 1: x 2 ) dni(xi) d/i 2 (x2) = -2(y3(//i),/3(// 2 )) 



when ^ G M(&) satisfy /^(JT) = 0. In particular, 

D(^) = -2\\^)f (3.9) 
when h G M(JT) satisfies /i( Jf) = 0. Thus: 

Proposition 3.10. Let ^ have negative type as witnessed by the embedding <p. Then 3£ 
is of strong negative type iff the barycenter map (3^ is infective on the set of probability 
measures on 3£ with finite first moment. 

For example, Euclidean spaces have strong negative type; this is most directly seen 
via the Fourier embedding, since then /?(//) is the function s h-> c(l — /u(s)J, where ju is 
the Fourier transform of u G Mi(R n ). Alternatively, one can see this via the Crofton 
embedding and the Cramer- Wold device, but the only decent proof of that device uses 
Fourier transforms. (Of course, in one dimension, the Crofton embedding is simple and 
easily shows that K has strong negative type without the use of Fourier transforms.) The 
barycenter of a for the Riesz embedding is essentially the Riesz potential of u; more 
precisely, if a and u' are probability measures with finite first moment, then up to a 
constant factor, (3(p — a') is the Riesz potential of \i — ji' for the exponent (n — l)/2. 

Remark 3.11. Another way of saying Proposition [3.10| is that a metric space (JIT, d) has 
strong negative type iff the map (/ii,/^) | — > y—DijH — A*2)/2 is a metric on the set of 
probability measures on 3£ with finite first moment, in which case it extends the metric 
on (JT , <i 1//2 ) when we identify xG I" with the point mass at x. 

Remark 3.12. Here we give an example of a metric space of strict negative type that is 
not of strong negative type. In fact, it fails the condition for probability measures with 
countable support. The question amounts to whether, given a subset of a Hilbert space in 
which no 3 points form an obtuse triangle and such that the barycenter of every finitely 
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supported probability measure determines the measure uniquely, the barycenter of every 
probability measure determines the measure uniquely The answer is no. For example, let 
(ej) be an orthonormal basis of a Hilbert space. The desired subset consists of the vectors 



ei , 
ei + e 2 /2, 

e 2 + e 3 , 
e 3 + e 4 /2 , 

e 4 + e 5 , 
e 5 + e 6 /2, 

etc. It is obvious that finite convex combinations are unique and that there are no obtuse 
angles. But if v n denotes the nth vector, then 

ui/2 + u 3 /4 + v 5 /8 H = v 2 /2 + v 4 /4: + v 6 /8 H . 

Remark 3.13. If S£ is a metric space of negative type, then a : \i h-» is injective on 
/x G Mi(^T) with finite first moment iff S£ has strong negative type. Part of this statement 
is contained in Theorem 3.6 of Nickolas and Wolf (|2009| ). To prove it, let (p be an embedding 
of 3£ such that lies in the image of (p, which we may achieve by translation. Then 



x)\\ 2 -2(<p(x),^))+ / U(x')\\ 2 d^(x' 



whence = ay iff (<f)(x), = (4>(x), for all x (first use x so that <j>(x) = 0) iff 

(z, = (z, /?(//)) for all z in the closed linear span of the image of iff (3(fj,) = (3(fj,'). 

Now apply Proposition p.!0| . On the other hand, there are metric spaces not of negative 



type for which a is injective on the probability measures: e.g., take a finite metric space 
in which the distances to a fixed point are linearly independent. The map a is injective 
also for all separable L p spaces (1 < p < 00): see Linde ( p.986b| ) or Gorin and Koldobskh 

( pgrp . 



Given an H- valued random variable Z with finite first moment, we define its variance 
to be Var(Z) := E[\\Z - E[Z] || 2 ] . 

Proposition 3.14. If has negative type as witnessed by the embedding <f> and fi G 
Mi has finite first moment, then for all x, x' G $C ' , 

a^(x) = U(x) - ^)\\ 2 + D(fi)/2 , 
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D((j) = 2Vax((p(X)) if X ~ /jl, and 

dfj,(x, x') = -2{<j>{x) - </>(V) - P^(fj)) ■ 

Proof. Let X ~ /i. We have 

a M (x) = E[d(ar, X)] = E[||0Or) - 0(X)|| 2 ] = e[|| (cf>(x) - - - /%)) || 

= ||0(x)-/3 (/) (^)|| 2 + Var( ( />(X)). 

Integrating over x gives the first two identities. Substituting the first identity into the 
definition of gives the last identity. I 

For simplicity, we may, without loss of generality, work only with real Hilbert spaces. 
Let 2£ and <3f be metric spaces of negative type, witnessed by the embeddings <fi and ip, 
respectively. Consider the tensor embedding (x, y) i-> <p(x) <8> tp(y) of x & H ® H . 
This will be the key to analyzing when dcov(#) = 0. Recall that the inner product on 
H <g> H satisfies (hi <g> /i' l5 h2®h' 2 ) ■= (hi, h2)(h[, h' 2 ). 

Remark 3.15. Although we shall not need it, we may give iTx^ the associated "metric" 

d<j>&i> ((x, y), (x', y')) := \\(j)(x) ® ip(y) - (p(x') ® || 2 , 

so necessarily it is of negative type. Actually, one can check that this need not satisfy 
the triangle inequality when the origin is not in the images of cp an d ip, but, following 
a suggestion of ours, Leonard Schulman (personal communication, 2010) showed that it 
is indeed a metric when the images of <p and ip both contain the origin. Since we may 
translate <p and ip so that this holds, we may take this to be a metric if we wish. In this 
case, one can also express dj,®^ in terms of the original metrics on and . 



Proposition 3.16. Let SC , W have negative type as witnessed by the embeddings <p>, ip. 
Let 9 e Mi( % x W) have marginals \x G M\(S£) and v e M\{W), both with finite first 
moment. Then 9 o ((p <g) ip) -1 has finite first moment, so that (3cj,®ip(9) is defined, and we 
have that 

dcov(6>) = 4||/9^(f - fi x z/)|| 2 . 
Proof. Write (p := (p — /5^(/i) and ip := ip — pi^(v). By Proposition |3.14| , we have 

dcov(#)=4 j ($(x),$(x')){$(y),$(y'))d9 2 ((x,y),(x',y')) 

= 4 J $(x)®$(y),$(x')®f(y'))d9 2 ((x,y),(x',y')) 
= 411/^(60 II 2 - 
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In addition, since ||<^>(x)|| G L 2 {fx) and HV^y)!! G L 2 (v), we have \\4>{x) ®i/}(y)\\ G L l (9) by 
the Cauchy-Schwarz inequality, whence f3^^(9) is defined and 



%^)= / <Kx)®1>(y)M(x,y) = j (<Kx) - ® ty>{y) - Mv)) M&y) 

>(x) <S> tp{y) dO{x, y) - /^(aO ® = ~~ A* x ^) • 



In the special case where and ^ are Euclidean spaces and the embeddings 0, ^ are 
the Fourier embeddings, Proposition |3. 16| shows that dcov coincides with (the square of) the 



original definition of distance covariance in [SRB| (see (2.6) there); while if the embeddings 
are the Brownian embeddings, then Proposition |3.16j shows that distance covariance is the 
same as Brownian covariance (Theorem 8 of Szekely and Rizzo fl2009|) ; the condition there 
that X and Y have finite second moments is thus seen to be superfluous). The Crofton 
embedding gives 

/3<t>®i/j(0 ~ A* x v ) '■ i z i s ? w i t) ^ c p c q 9(z ■ x < s,w • y < t) — fx(z ■ x < s)v(w • y <t) 
for 9 G Mi(M. p x M. q ) with marginals fx, v having finite first moments, whence for [X, Y) G 



R p x R q , Proposition |3.16| shows that 
dcov(X,y) = 

AcpC q J j \P[z-X <s,w-Y <t]-P[z-X < s]P[w-Y < t]\ 2 d(fl p x Q q ){z, w) d\ 2 (s, t) . 

When p = q = 1, this formula was shown to us by Gabor Szekely (personal communication, 
2010). 

Write M 1 (^T) for the subset of fx G M(JT) such that \fx\ has a finite first moment. 
Write M 1 ' 1 ^ x &) for the subset of 9 G M( S£ x &) such that both marginals of \9\ have 
finite first moment. 

Lemma 3.17. Let 3£ , W have negative type as witnessed by the embeddings <fi, if). If <fi and 
ifj have the property that (3^ and (5^ are injective on both M l (3s) and M 1 (^) (not merely 
on the probability measures), then f3<p®^ is injective on M 1,1 (3£ x <3f). 

Proof. Let 9 G M 1,1 («SKT x W) satisfy f3 (/> ^(9) = 0. For k G H, define the bounded linear 
map Tk : H <E> H — > H by linearity, continuity, and 

Tk(u <g) v) := (u, k)v . 
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More precisely, one uses the above definition on <g> ej for an orthonormal basis {e^} of H 
and then extends. Also, define 

v k (B):= J(<P(x),k)l B (y)de(x,y) (B C ^ Borel) , 

so that 

/Vz/fc) = y (0(x), fc)^(y) y) = Jr k (<p(x) <g> </>(y)) d0(x, y) = T k {(3^(9)) = . 

This implies that v k = by injectivity of /3^. As this is valid for each k G if, we obtain 
that for every Borel BCf, 



y 0(x)l s (y) d0(z,y) = O. 



Defining 

^lb(A) := 9(A x B) (AC Borel) , 

we have P^(hb) = J 4>{. X )^-B(y) dd(x, y) = 0, whence \xb = by injectivity of fy. In other 
words 6 (A x B) = for every pair of Borel sets A and 5. Since such product sets generate 
the product a-field on 3£ x & , it follows that 9 = 0. I 

Lemma 3.18. Let 5£ have strong negative type. There exists an embedding (p so that (3^ is 
infective on M (not merely on the probability measures) . 

Proof. If (f) : SC — > H is an embedding that induces an injective barycenter map on 
M\(2£\ then the map x i— > (</>(x), l) G F x R is an embedding that induces an injective 
barycenter map on M I 

Remark 3.19. We may choose the embeddings so that d^^ is a metric and fl^®^ is injec- 
tive on M 1 («5T x which yields that d^®^ is of strong negative type by Proposition |3. 10 



Indeed, first translate cf> and ip so that each contains in its image. This makes d^®^ a 
metric by Remark |3.15| . Then use the embedding x i— >■ ((j>(x), l) and likewise for ijj. This 
does not change the metric. 

As we observed in Section |^, it is immediate from the definition that if 9 is a product 
measure, then dcov(#) = 0. A converse and the key result of the theory holds for metric 
spaces of strong negative type: 

Theorem 3.20. Suppose that both 3£ and <3f have strong negative type and 9 is a proba- 
bility measure on X x <3f whose marginals have finite first moment. If dcov(#) = 0, then 
9 is a product measure. 

This is an immediate corollary of Proposition |3.16| and Lemmas |3.17| and |3.18| . There- 
fore, Corollary |2.8| gives a test for independence that is consistent against all alternatives 
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when SC and <3/ both have strong negative type. See Theorem 6 of |SRB| for the significance 
levels of the test. 



For the Fourier embedding of Euclidean space, Theorem 3.20 amounts to the fact that 
9 = \i x v if the Fourier transform of 9 is the (tensor) product of the Fourier transforms of 
\x and v. This was the motivation presented in |S RB for dCov. 



Remark 3.21. In the case of categorical data, we may embed each data space as a simplex 
with edges of unit length. Let the corresponding Hilbert-space vectors be e x /\/2 and 
fyl \/2, where e x are orthonormal and f y are orthonormal. The product space then embeds 
as a simplex on the orthogonal vectors e x ® f y /2 and the barycenter of 9 is J2 X y@( x ' v) e x ® 
fy/2. Let 9 n , (i n , and v n be the empirical measures as in Corollary ^|. Proposition |3.16 
yields 

dcov(/9 n ) = [9 n (x,y) - ^ n (x)u n (y)] 2 . 



The test statistic in |(2.5)| is thus 



For comparison, Pearson's x 2 -statistic is 

2 

[9 n (x,y) - n n {x)v n {y)] 



n 



E 

x,y 



Remark 3.22. As Gabor Szekely has remarked (personal communication, 2010), there is 
a 2-dimensional random variable (X, Y) such that X and Y are not independent, yet if 
(X',Y r ) is an independent copy of (X, Y), then \X — X'\ and \Y — Y'\ are uncorrelated. 
Indeed, consider the density function p(x, y) := (l/4 — q(x)q(y))l[_i j i]2(x, y) with q(x) := 
— (c/2)l[_ 10 ] + (l/2)l( c ), where c := \/2 — 1. Then it is not hard to check that this gives 
such an example. 



Remark 3.23. According to Proposition |3.16| , dcov(#) = —2D (9 — \i x v) for the metric 



space (J x deftig)^). Since this metric space has strong negative type when i£T and & 
do, we can view the fact that dcov(#) = only for product measures as a special case of 
the fact that D(0 1 - 9 2 ) = only when 9 X = 9 2 for 4 e M\{3£ x W). Similarly, any other 
metric on iTx ^ of strong negative type could be used to give a test of independence via 
D{9 — x v)\ indeed, when SC = M p and & — M. q , the Euclidean metric on M. p+q was used 
by Bakirov, Rizzo, and Szekely ( p006|) for precisely such a test. 



No such result as Theorem |3.20| holds if either 5£ or W is not of strong negative type: 
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Proposition 3.24. If 3£ is not of negative type, then for every metric space IV with at 
least two points, there exists 9 G Mi(3£ x ty) whose marginals have finite first moments 
and such that dcov(#) < 0. If SC is not of strong negative type, then for every metric space 

6 with at least two points, there exists 9 G M\(3£ x ty) whose marginals have finite first 
moments and such that dcov(#) = 0, yet 9 is not a product measure. 

Proof. Choose two distinct points 2/1,2/2 G *3f . Let \i\ 7^ \ii G Mi(JT) have finite first 
moments and satisfy -D(//i — ^2) > 0, where > applies if does not have negative type. 
In this latter case, set 9 := (^1 x 5(yi) + ^2 x ^(2/2)) /2. Then a little algebra reveals that 

dcov(^) = -d( Vl , y 2 )D(ii 1 - i* 2 )/8 < . 

In general, note that if X\ 7^ £2, then D[5{x\) — 5(x2)) < 0, whence there is some 

7 G (0, 1] such that if Tj := 7^ + (1 — 7)5(2^), then D(t± — r 2 ) = 0. Set 9 := (n x 8{y\) + 
t 2 x S(y 2 ))/2. Then 

dcov(^) = -d(y 1 ,y 2 )D(T 1 - r 2 )/8 = , 
yet ^ is not a product measure. I 

There remains the possibility that the kernel h in the proof of Proposition |2.6| is 
degenerate of order 1 only when 9 is a product measure. If that is true, then Corollary [2.8| 
gives a consistent test for independence even in metric spaces not of negative type, since 
when h is not degenerate and dcov(#) = 0, ^/ndcov(9 n ) has a non-trivial normal limit in 
distribution, whence ndcov(6* n ) — > 00 a.s. We have not investigated this possibility. 

Since every Euclidean space is of strict negative type, so is every Hilbert space. Sep- 
arable Hilbert spaces are even of strong negative type, though this is considerably more 
subtle. Therefore, dcov(#) = implies that 9 G Mi(3£ x <3f) is a product measure when 
J?T and <3( are separable Hilbert spaces, which resolves a question of Kosorok (J 

Theorem 3.25. Every separable Hilbert space is of strong negative type. 



Proof. This follows from Remark |3.13| and Theorem 6 of Linde ( |1986a|) or Theorem 1 of 



Koldobskh ( |1982|) , who prove more. Likewise, separable L p spaces with 1 < p < 2 are 
of strong negative type. However, we give a direct proof that is shorter, which keeps our 
paper self-contained. 

Our proof relies on a known Gaussian variant of the Crofton embedding. Let Z n 
(n > 1) be IID standard normal random variables with law p on Given u = (u n ; n G 
Z + ) G £ 2 (Z + ), define the random variable Z(u) := J2 n >i u n^n- Then Z(u) is a centered 
normal random variable with standard deviation equal to ||w||2- Therefore, E[|Z(w) 
c||u|| 2 with c := E[|Zi|]. 
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Let A be Lebesgue measure on R. For w,u G R°°, write w(u) := limsup^ J2 n =i u nW n - 
We choose £ 2 (Z + ) as our separable Hilbert space, which we embed into another Hilbert 
space, L 2 (R°° x R, p x A), by 

(p(u) : (w, S) ^ l[ w (u)/c,oo)(s) - l[ 0) oo)(s) • 

Then \\(p(u) - (p(u') \\ 2 2 = \\(p(u) - 0(O||i = \\u - u'\\ 2 for all u,u' G £ 2 (Z+). Let p u p 2 G 
Mi(£ 2 (Z+)) have finite first moments. Set /i := p\ — p 2 . Because J dp = 0, we have 

/3<^(/i) : (w, s) ^ ; w(u) < cs} . 

Note that since for every it G £ 2 (Z+), the series w{u) converges p-a.s., Fubini's theorem 
tells us that for p-a.e. w, w(u) converges for pi-a.e. it. We need to show that if fi^p) = 
p x A-a.s., then p = 0. So assume that ^(p) = p x A-a.s. It suffices to show that 

; (it, d) < s} = for every finitely supported u G M°° and every s G K, since that 
implies that the finite dimensional marginals of p are by the Cramer- Wold device. 

Let K > 1. For u; G K°°, write w<k for the vector (wi, . . . ,wk) G R k and WyK for 
(%H, mik|2, • • •) G R°° . Since the law p of w = (w<k, w >k) is a product measure, with 
\ K absolutely continuous with respect to the first factor and with the second factor equal 
to p, Fubini's theorem gives that for p-a.e. w, for X K -a.e. v G R K , and for A-a.e. sGK, 
we have /3(p)((v,w), s) = 0. Since (v,s) h-> f3{p)({y, w), s) possesses sufficient continuity 
properties, we have that for p-a.e. w, for all v G M K and all sGt, /3(p) ((i>, iy), s) = 0. 

Let e > 0. Choose K so large that c J* ||it>K"||2 dpi{u) < e 2 for z = 1,2, which is 
possible by Lebesgue's dominated convergence theorem and the fact that pi has finite first 
moment. Let 

A(e) := {(u,w) G £ 2 (Z+) x R°° ; \w(u >K )\ > e} . 
Markov's inequality yields that 

(pi x p)A(e) < e" 1 ||M;(w > K)|| L i( M . Xp ) = e _1 c J \\u >K hdpi(u) < e, 

where the equality arises from Fubini's theorem. Therefore, there is some w such that 
denoting A(w,e) := {it; \w(u > k)\ > e}, we have /3(p)((v,w), s) = for all i> G s G K 
and 

PiA(w, e) < e . 

For such a w, we have for all v, s that 

Pi{-u; (u< K ,v) < s - e} - e < ^{it; (u< K ,v) + w(u >K ) < s} 

< Pi{u; (u<k,v) < s + e} + e . 
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The middle quantity is the same for i = 1 as for i = 2 by choice of w. Therefore, for all 
v E WL K and s E R, 

Hi [(u<k, v) < s - e] - e < ^ 2 [(u<k, v) < s + e] + e 

and 

(J-2 [{u<K, v) < s - e] - e < (i x [(u< K , v) < s + e] + e . 
Although K depends on e, it follows that for all L < K and all v E M L , seE, 
/ii [(u<L, v) < s - e] - e < \i 2 [(u< L , v) < s + e] + e 

and 

(J-2 [{u<L, v) < s - e] - e < /ii [(u< L , v) < s + e] + e . 
Thus, if we fix L, the above inequalities hold for all e, which implies that 

Hi [(u< L ,v) < s] = jJ2 [(u<L, v) < s] . 
This is what we needed to show. I 

Non-separable Hilbert spaces H are of strong negative type iff their dimension is a 
cardinal of measure zero. (Whether there exist cardinals not of measure zero is a subtle 
question that involves foundational issues; see Chapter 23 of Just and Weese ( |1997| ).) To 
see this equivalence, note first that if every Borel probability measure on H is carried by 
a separable subset, then H has strong negative type by the preceding theorem. Now a 
theorem of Marczewski and Sikorski (|1948|) (or see Theorem 2 of Appendix III in Billingsley 
( p.968 )) implies that this separable-carrier condition holds if (and only if) the dimension 
of H is a cardinal of measure zero. Conversely, if the dimension of H is not a cardinal 
of measure zero, then let / be an orthonormal basis of H. By definition, there exists a 
probability measure \i on the subsets of / that vanishes on singletons. Write / = Ji U 1%, 
where I\ and I2 are disjoint and equinumerous with /. Define [ij (j = 1,2) on ij by 
pushing forward \x via a bijection from / to Ij. Extend fij to H in the obvious way (all 
subsets of / are Borel in H since they are G^-sets). Then \i\ 7^ yet D(ni — ^2) = 0. 

Corollary 3.26. If {3£,d) is a separable metric space of negative type, then (i^d 1 / 2 ) 
is a metric space of strong negative type. 

Proof. Let (p : (i^ ', d 1 / 2 ) — » H be an isometric embedding to a separable Hilbert space. Let 
ip : [H, || •|| 1 / 2 ) — > H' be an isometric embedding to another separable Hilbert space such 
that 13^ is injective on M\(H), which exists by Theorem |3.25| . Then ip o <f> : (J*T, d 1 / 4 ) — > 
H' is an isometric embedding to a Hilbert space whose barycenter map is injective on 
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This means that we can apply a distance covariance test of independence to any pair 
of metric spaces of negative type provided we use square roots of distances in place of 
distances. This even has the small advantage that the probability measures in question 
need have only finite half-moments. 



Remark 3.27. In fact, Linde (|1986a| ) proves that the map a : \x i— > of Remark [3.13 



is 



injective on Ml(H, \\*\\ r ) for all r G IR + \ 2N. It follows that if {S£ , d) has negative type, 
then (i2f,cT) has strong negative type when < r < 1. For let <fi : \d}/ 2 ) — > H be an 
isometric embedding. By Linde's result, the map 

/j i — y I x i — y / d(x, x') r d/j(x') = / \\<j>(x) — (p(x r ) \\ 2r dn{x') 



is injective. Since ( JT, d r ) has negative type by a theorem of Schoenberg (|1938j ), the claim 
follows from Remark [3.13| . 

Corollary 3.28. If ,d«tr) and are metric spaces of negative type, then x 

^ i (d,T + day) 1 / 2 ) is a metric space of strong negative type. 

Proof. It is easy to see that x ty^dse + day) is of negative type, whence the result 
follows from Corollary |3.26| . I 



Thus, another way to test independence for metric spaces (&,d%-) and (<3f,doy) of 
negative type (not necessarily strong) uses not dcov(6>), but D{6 — \i x v) with respect to 
the metric [d^ + day) 1 / 2 on if x ^; compare Remark |3.23| . By Remark |3.27| , the same 
holds for (Jx^, (^ + day) r ) with any r G (0, 1). 

We remark finally that for separable metric spaces of negative type, the proofs of 



Proposition 2.6, Theorem [2.7|, and Corollary EOi are more straightforward, as they can rely 



on the strong law of large numbers and the central limit theorem in Hilbert space. 
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